Speech 2 Speech

eleven labs

hume ai

recent speech language model

GLM-4-Voice

J-Moshi を試す

Moshi: a speech-text foundation model for real-time dialogue

Soundwave: Less is More for Speech-Text Alignment in LLMs

Crossing the uncanny valley of conversational voice

Efficient and Direct Duplex Modeling for Speech-to-Speech Language Model

ターン検出のsmart-turnでリアルタイムで発話中かどうかを判定する

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Thai Semantic End-of-Turn Detection for Real-Time Voice Agents